Dataset statistics
| Number of variables | 30 |
|---|---|
| Number of observations | 3772 |
| Missing cells | 6064 |
| Missing cells (%) | 5.4% |
| Duplicate rows | 33 |
| Duplicate rows (%) | 0.9% |
| Total size in memory | 884.2 KiB |
| Average record size in memory | 240.0 B |
Variable types
| Numeric | 6 |
|---|---|
| Categorical | 3 |
| Boolean | 20 |
| Unsupported | 1 |
TBG_measured has constant value "False" | Constant |
| Dataset has 33 (0.9%) duplicate rows | Duplicates |
TSH is highly overall correlated with Class | High correlation |
T3 is highly overall correlated with TT4 and 2 other fields | High correlation |
TT4 is highly overall correlated with T3 and 2 other fields | High correlation |
T4U is highly overall correlated with pregnant and 1 other fields | High correlation |
FTI is highly overall correlated with T3 and 2 other fields | High correlation |
pregnant is highly overall correlated with T4U | High correlation |
psych is highly overall correlated with referral_source | High correlation |
TSH_measured is highly overall correlated with T3_measured and 3 other fields | High correlation |
T3_measured is highly overall correlated with TSH_measured and 1 other fields | High correlation |
TT4_measured is highly overall correlated with TSH_measured and 3 other fields | High correlation |
T4U_measured is highly overall correlated with TSH_measured and 2 other fields | High correlation |
FTI_measured is highly overall correlated with TSH_measured and 2 other fields | High correlation |
referral_source is highly overall correlated with psych | High correlation |
Class is highly overall correlated with TSH and 2 other fields | High correlation |
sex has 150 (4.0%) missing values | Missing |
TSH has 369 (9.8%) missing values | Missing |
T3 has 769 (20.4%) missing values | Missing |
TT4 has 231 (6.1%) missing values | Missing |
T4U has 387 (10.3%) missing values | Missing |
FTI has 385 (10.2%) missing values | Missing |
TBG has 3772 (100.0%) missing values | Missing |
TBG is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
Reproduction
| Analysis started | 2023-03-07 05:15:28.488424 |
|---|---|
| Analysis finished | 2023-03-07 05:16:37.989797 |
| Duration | 1 minute and 9.5 seconds |
| Software version | pandas-profiling vv3.5.0 |
| Download configuration | config.json |
age
Real number (ℝ)
| Distinct | 93 |
|---|---|
| Distinct (%) | 2.5% |
| Missing | 1 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 51.735879 |
| Minimum | 1 |
|---|---|
| Maximum | 455 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 29.6 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 20 |
| Q1 | 36 |
| median | 54 |
| Q3 | 67 |
| 95-th percentile | 79 |
| Maximum | 455 |
| Range | 454 |
| Interquartile range (IQR) | 31 |
Descriptive statistics
| Standard deviation | 20.084958 |
|---|---|
| Coefficient of variation (CV) | 0.38822107 |
| Kurtosis | 41.86283 |
| Mean | 51.735879 |
| Median Absolute Deviation (MAD) | 15 |
| Skewness | 1.9558145 |
| Sum | 195096 |
| Variance | 403.40555 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 59 | 95 | 2.5% |
| 60 | 91 | 2.4% |
| 70 | 90 | 2.4% |
| 73 | 81 | 2.1% |
| 55 | 81 | 2.1% |
| 63 | 78 | 2.1% |
| 72 | 77 | 2.0% |
| 58 | 77 | 2.0% |
| 62 | 75 | 2.0% |
| 61 | 74 | 2.0% |
| Other values (83) | 2952 |
| Value | Count | Frequency (%) |
| 1 | 6 | |
| 2 | 4 | |
| 4 | 1 | < 0.1% |
| 5 | 1 | < 0.1% |
| 6 | 1 | < 0.1% |
| 7 | 5 | |
| 8 | 3 | |
| 10 | 1 | < 0.1% |
| 11 | 4 | |
| 12 | 4 |
| Value | Count | Frequency (%) |
| 455 | 1 | < 0.1% |
| 94 | 2 | 0.1% |
| 93 | 2 | 0.1% |
| 92 | 2 | 0.1% |
| 91 | 2 | 0.1% |
| 90 | 5 | |
| 89 | 8 | |
| 88 | 9 | |
| 87 | 12 | |
| 86 | 6 |
sex
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 150 |
| Missing (%) | 4.0% |
| Memory size | 29.6 KiB |
| F | |
|---|---|
| M |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 3622 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | F |
|---|---|
| 2nd row | F |
| 3rd row | M |
| 4th row | F |
| 5th row | F |
Common Values
| Value | Count | Frequency (%) |
| F | 2480 | |
| M | 1142 | |
| (Missing) | 150 | 4.0% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| f | 2480 | |
| m | 1142 |
Most occurring characters
| Value | Count | Frequency (%) |
| F | 2480 | |
| M | 1142 |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 3622 |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| F | 2480 | |
| M | 1142 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 3622 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| F | 2480 | |
| M | 1142 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 3622 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| F | 2480 | |
| M | 1142 |
on_thyroxine
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 3.8 KiB |
| False | |
|---|---|
| True |
| Value | Count | Frequency (%) |
| False | 3308 | |
| True | 464 | 12.3% |
query_on_thyroxine
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 3.8 KiB |
| False | |
|---|---|
| True | 50 |
| Value | Count | Frequency (%) |
| False | 3722 | |
| True | 50 | 1.3% |
on_antithyroid_medication
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 3.8 KiB |
| False | |
|---|---|
| True | 43 |
| Value | Count | Frequency (%) |
| False | 3729 | |
| True | 43 | 1.1% |
sick
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 3.8 KiB |
| False | |
|---|---|
| True | 147 |
| Value | Count | Frequency (%) |
| False | 3625 | |
| True | 147 | 3.9% |
pregnant
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 3.8 KiB |
| False | |
|---|---|
| True | 53 |
| Value | Count | Frequency (%) |
| False | 3719 | |
| True | 53 | 1.4% |
thyroid_surgery
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 3.8 KiB |
| False | |
|---|---|
| True | 53 |
| Value | Count | Frequency (%) |
| False | 3719 | |
| True | 53 | 1.4% |
I131_treatment
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 3.8 KiB |
| False | |
|---|---|
| True | 59 |
| Value | Count | Frequency (%) |
| False | 3713 | |
| True | 59 | 1.6% |
query_hypothyroid
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 3.8 KiB |
| False | |
|---|---|
| True | 234 |
| Value | Count | Frequency (%) |
| False | 3538 | |
| True | 234 | 6.2% |
query_hyperthyroid
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 3.8 KiB |
| False | |
|---|---|
| True | 237 |
| Value | Count | Frequency (%) |
| False | 3535 | |
| True | 237 | 6.3% |
lithium
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 3.8 KiB |
| False | |
|---|---|
| True | 18 |
| Value | Count | Frequency (%) |
| False | 3754 | |
| True | 18 | 0.5% |
goitre
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 3.8 KiB |
| False | |
|---|---|
| True | 34 |
| Value | Count | Frequency (%) |
| False | 3738 | |
| True | 34 | 0.9% |
tumor
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 3.8 KiB |
| False | |
|---|---|
| True | 96 |
| Value | Count | Frequency (%) |
| False | 3676 | |
| True | 96 | 2.5% |
hypopituitary
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 3.8 KiB |
| False | |
|---|---|
| True | 1 |
| Value | Count | Frequency (%) |
| False | 3771 | |
| True | 1 | < 0.1% |
psych
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 3.8 KiB |
| False | |
|---|---|
| True | 184 |
| Value | Count | Frequency (%) |
| False | 3588 | |
| True | 184 | 4.9% |
TSH_measured
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 3.8 KiB |
| True | |
|---|---|
| False |
| Value | Count | Frequency (%) |
| True | 3403 | |
| False | 369 | 9.8% |
| Distinct | 287 |
|---|---|
| Distinct (%) | 8.4% |
| Missing | 369 |
| Missing (%) | 9.8% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 5.0867661 |
| Minimum | 0.005 |
|---|---|
| Maximum | 530 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 29.6 KiB |
Quantile statistics
| Minimum | 0.005 |
|---|---|
| 5-th percentile | 0.025 |
| Q1 | 0.5 |
| median | 1.4 |
| Q3 | 2.7 |
| 95-th percentile | 13 |
| Maximum | 530 |
| Range | 529.995 |
| Interquartile range (IQR) | 2.2 |
Descriptive statistics
| Standard deviation | 24.52147 |
|---|---|
| Coefficient of variation (CV) | 4.8206405 |
| Kurtosis | 238.18146 |
| Mean | 5.0867661 |
| Median Absolute Deviation (MAD) | 1.04 |
| Skewness | 13.882653 |
| Sum | 17310.265 |
| Variance | 601.30251 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0.2 | 116 | 3.1% |
| 1.3 | 105 | 2.8% |
| 1.1 | 97 | 2.6% |
| 1.4 | 91 | 2.4% |
| 1.5 | 80 | 2.1% |
| 1.9 | 79 | 2.1% |
| 1.2 | 79 | 2.1% |
| 1.6 | 78 | 2.1% |
| 1.7 | 73 | 1.9% |
| 2.3 | 70 | 1.9% |
| Other values (277) | 2535 | |
| (Missing) | 369 | 9.8% |
| Value | Count | Frequency (%) |
| 0.005 | 52 | |
| 0.01 | 24 | |
| 0.015 | 26 | |
| 0.02 | 55 | |
| 0.025 | 17 | 0.5% |
| 0.03 | 25 | |
| 0.035 | 19 | 0.5% |
| 0.04 | 17 | 0.5% |
| 0.045 | 13 | 0.3% |
| 0.05 | 50 |
| Value | Count | Frequency (%) |
| 530 | 1 | |
| 478 | 1 | |
| 472 | 1 | |
| 468 | 1 | |
| 440 | 1 | |
| 400 | 1 | |
| 236 | 1 | |
| 230 | 1 | |
| 199 | 1 | |
| 188 | 1 |
T3_measured
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 3.8 KiB |
| True | |
|---|---|
| False |
| Value | Count | Frequency (%) |
| True | 3003 | |
| False | 769 | 20.4% |
| Distinct | 69 |
|---|---|
| Distinct (%) | 2.3% |
| Missing | 769 |
| Missing (%) | 20.4% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.0134998 |
| Minimum | 0.05 |
|---|---|
| Maximum | 10.6 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 29.6 KiB |
Quantile statistics
| Minimum | 0.05 |
|---|---|
| 5-th percentile | 0.8 |
| Q1 | 1.6 |
| median | 2 |
| Q3 | 2.4 |
| 95-th percentile | 3.4 |
| Maximum | 10.6 |
| Range | 10.55 |
| Interquartile range (IQR) | 0.8 |
Descriptive statistics
| Standard deviation | 0.82743419 |
|---|---|
| Coefficient of variation (CV) | 0.41094326 |
| Kurtosis | 9.8680496 |
| Mean | 2.0134998 |
| Median Absolute Deviation (MAD) | 0.4 |
| Skewness | 1.730874 |
| Sum | 6046.54 |
| Variance | 0.68464734 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 2 | 238 | 6.3% |
| 1.8 | 207 | 5.5% |
| 2.2 | 201 | 5.3% |
| 1.9 | 189 | 5.0% |
| 2.1 | 184 | 4.9% |
| 2.3 | 183 | 4.9% |
| 1.6 | 159 | 4.2% |
| 1.7 | 157 | 4.2% |
| 1.5 | 141 | 3.7% |
| 2.4 | 137 | 3.6% |
| Other values (59) | 1207 | |
| (Missing) | 769 |
| Value | Count | Frequency (%) |
| 0.05 | 2 | 0.1% |
| 0.1 | 2 | 0.1% |
| 0.2 | 18 | |
| 0.3 | 22 | |
| 0.4 | 20 | |
| 0.5 | 16 | 0.4% |
| 0.6 | 20 | |
| 0.7 | 32 | |
| 0.8 | 40 | |
| 0.9 | 42 |
| Value | Count | Frequency (%) |
| 10.6 | 1 | |
| 8.5 | 1 | |
| 7.6 | 1 | |
| 7.3 | 1 | |
| 7.1 | 2 | |
| 7 | 1 | |
| 6.7 | 1 | |
| 6.6 | 1 | |
| 6.2 | 1 | |
| 6.1 | 1 |
TT4_measured
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 3.8 KiB |
| True | |
|---|---|
| False | 231 |
| Value | Count | Frequency (%) |
| True | 3541 | |
| False | 231 | 6.1% |
| Distinct | 241 |
|---|---|
| Distinct (%) | 6.8% |
| Missing | 231 |
| Missing (%) | 6.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 108.31934 |
| Minimum | 2 |
|---|---|
| Maximum | 430 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 29.6 KiB |
Quantile statistics
| Minimum | 2 |
|---|---|
| 5-th percentile | 63 |
| Q1 | 88 |
| median | 103 |
| Q3 | 124 |
| 95-th percentile | 170 |
| Maximum | 430 |
| Range | 428 |
| Interquartile range (IQR) | 36 |
Descriptive statistics
| Standard deviation | 35.604248 |
|---|---|
| Coefficient of variation (CV) | 0.32869704 |
| Kurtosis | 6.6184389 |
| Mean | 108.31934 |
| Median Absolute Deviation (MAD) | 18 |
| Skewness | 1.267704 |
| Sum | 383558.8 |
| Variance | 1267.6624 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 101 | 71 | 1.9% |
| 93 | 67 | 1.8% |
| 103 | 63 | 1.7% |
| 98 | 63 | 1.7% |
| 102 | 59 | 1.6% |
| 87 | 59 | 1.6% |
| 94 | 56 | 1.5% |
| 91 | 56 | 1.5% |
| 99 | 55 | 1.5% |
| 120 | 54 | 1.4% |
| Other values (231) | 2938 | |
| (Missing) | 231 | 6.1% |
| Value | Count | Frequency (%) |
| 2 | 1 | < 0.1% |
| 2.9 | 1 | < 0.1% |
| 3 | 2 | 0.1% |
| 4 | 1 | < 0.1% |
| 4.8 | 1 | < 0.1% |
| 5.8 | 2 | 0.1% |
| 6 | 1 | < 0.1% |
| 9.5 | 1 | < 0.1% |
| 10 | 5 | |
| 11 | 2 | 0.1% |
| Value | Count | Frequency (%) |
| 430 | 2 | |
| 372 | 1 | |
| 301 | 1 | |
| 289 | 1 | |
| 273 | 1 | |
| 272 | 1 | |
| 263 | 1 | |
| 261 | 1 | |
| 258 | 1 | |
| 257 | 1 |
T4U_measured
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 3.8 KiB |
| True | |
|---|---|
| False |
| Value | Count | Frequency (%) |
| True | 3385 | |
| False | 387 | 10.3% |
| Distinct | 146 |
|---|---|
| Distinct (%) | 4.3% |
| Missing | 387 |
| Missing (%) | 10.3% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.9949997 |
| Minimum | 0.25 |
|---|---|
| Maximum | 2.32 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 29.6 KiB |
Quantile statistics
| Minimum | 0.25 |
|---|---|
| 5-th percentile | 0.74 |
| Q1 | 0.88 |
| median | 0.98 |
| Q3 | 1.08 |
| 95-th percentile | 1.34 |
| Maximum | 2.32 |
| Range | 2.07 |
| Interquartile range (IQR) | 0.2 |
Descriptive statistics
| Standard deviation | 0.19545728 |
|---|---|
| Coefficient of variation (CV) | 0.19643953 |
| Kurtosis | 4.0734715 |
| Mean | 0.9949997 |
| Median Absolute Deviation (MAD) | 0.1 |
| Skewness | 1.2326742 |
| Sum | 3368.074 |
| Variance | 0.038203546 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0.99 | 95 | 2.5% |
| 0.9 | 93 | 2.5% |
| 1.01 | 91 | 2.4% |
| 1 | 90 | 2.4% |
| 0.92 | 89 | 2.4% |
| 0.93 | 88 | 2.3% |
| 0.97 | 88 | 2.3% |
| 1.02 | 87 | 2.3% |
| 0.91 | 85 | 2.3% |
| 0.95 | 83 | 2.2% |
| Other values (136) | 2496 | |
| (Missing) | 387 | 10.3% |
| Value | Count | Frequency (%) |
| 0.25 | 1 | < 0.1% |
| 0.31 | 1 | < 0.1% |
| 0.36 | 1 | < 0.1% |
| 0.38 | 1 | < 0.1% |
| 0.41 | 1 | < 0.1% |
| 0.46 | 1 | < 0.1% |
| 0.47 | 1 | < 0.1% |
| 0.48 | 2 | |
| 0.49 | 1 | < 0.1% |
| 0.5 | 3 |
| Value | Count | Frequency (%) |
| 2.32 | 1 | |
| 2.12 | 1 | |
| 2.03 | 1 | |
| 2.01 | 1 | |
| 1.97 | 1 | |
| 1.94 | 1 | |
| 1.93 | 1 | |
| 1.88 | 2 | |
| 1.84 | 1 | |
| 1.83 | 2 |
FTI_measured
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 3.8 KiB |
| True | |
|---|---|
| False |
| Value | Count | Frequency (%) |
| True | 3387 | |
| False | 385 | 10.2% |
| Distinct | 234 |
|---|---|
| Distinct (%) | 6.9% |
| Missing | 385 |
| Missing (%) | 10.2% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 110.46965 |
| Minimum | 2 |
|---|---|
| Maximum | 395 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 29.6 KiB |
Quantile statistics
| Minimum | 2 |
|---|---|
| 5-th percentile | 68 |
| Q1 | 93 |
| median | 107 |
| Q3 | 124 |
| 95-th percentile | 166 |
| Maximum | 395 |
| Range | 393 |
| Interquartile range (IQR) | 31 |
Descriptive statistics
| Standard deviation | 33.089698 |
|---|---|
| Coefficient of variation (CV) | 0.29953655 |
| Kurtosis | 7.874558 |
| Mean | 110.46965 |
| Median Absolute Deviation (MAD) | 15 |
| Skewness | 1.3454318 |
| Sum | 374160.7 |
| Variance | 1094.9281 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 100 | 73 | 1.9% |
| 93 | 70 | 1.9% |
| 114 | 65 | 1.7% |
| 98 | 64 | 1.7% |
| 107 | 64 | 1.7% |
| 92 | 63 | 1.7% |
| 104 | 63 | 1.7% |
| 106 | 59 | 1.6% |
| 101 | 59 | 1.6% |
| 97 | 59 | 1.6% |
| Other values (224) | 2748 | |
| (Missing) | 385 | 10.2% |
| Value | Count | Frequency (%) |
| 2 | 1 | |
| 2.8 | 1 | |
| 3 | 2 | |
| 4 | 1 | |
| 5.4 | 1 | |
| 7 | 1 | |
| 7.6 | 1 | |
| 8.4 | 1 | |
| 8.5 | 1 | |
| 8.9 | 1 |
| Value | Count | Frequency (%) |
| 395 | 2 | |
| 362 | 1 | |
| 349 | 1 | |
| 312 | 1 | |
| 291 | 1 | |
| 283 | 1 | |
| 281 | 1 | |
| 280 | 1 | |
| 274 | 1 | |
| 265 | 1 |
TBG_measured
Boolean
| Distinct | 1 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 3.8 KiB |
| False |
|---|
| Value | Count | Frequency (%) |
| False | 3772 |
referral_source
Categorical
| Distinct | 5 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 29.6 KiB |
| other | |
|---|---|
| SVI | |
| SVHC | |
| STMW | 112 |
| SVHD | 39 |
Length
| Max length | 5 |
|---|---|
| Median length | 5 |
| Mean length | 4.3093849 |
| Min length | 3 |
Characters and Unicode
| Total characters | 16255 |
|---|---|
| Distinct characters | 14 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | SVHC |
|---|---|
| 2nd row | other |
| 3rd row | other |
| 4th row | other |
| 5th row | SVI |
Common Values
| Value | Count | Frequency (%) |
| other | 2201 | |
| SVI | 1034 | |
| SVHC | 386 | 10.2% |
| STMW | 112 | 3.0% |
| SVHD | 39 | 1.0% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| other | 2201 | |
| svi | 1034 | |
| svhc | 386 | 10.2% |
| stmw | 112 | 3.0% |
| svhd | 39 | 1.0% |
Most occurring characters
| Value | Count | Frequency (%) |
| o | 2201 | |
| t | 2201 | |
| h | 2201 | |
| e | 2201 | |
| r | 2201 | |
| S | 1571 | |
| V | 1459 | |
| I | 1034 | |
| H | 425 | 2.6% |
| C | 386 | 2.4% |
| Other values (4) | 375 | 2.3% |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 11005 | |
| Uppercase Letter | 5250 |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| S | 1571 | |
| V | 1459 | |
| I | 1034 | |
| H | 425 | 8.1% |
| C | 386 | 7.4% |
| T | 112 | 2.1% |
| M | 112 | 2.1% |
| W | 112 | 2.1% |
| D | 39 | 0.7% |
Lowercase Letter
| Value | Count | Frequency (%) |
| o | 2201 | |
| t | 2201 | |
| h | 2201 | |
| e | 2201 | |
| r | 2201 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 16255 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| o | 2201 | |
| t | 2201 | |
| h | 2201 | |
| e | 2201 | |
| r | 2201 | |
| S | 1571 | |
| V | 1459 | |
| I | 1034 | |
| H | 425 | 2.6% |
| C | 386 | 2.4% |
| Other values (4) | 375 | 2.3% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 16255 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| o | 2201 | |
| t | 2201 | |
| h | 2201 | |
| e | 2201 | |
| r | 2201 | |
| S | 1571 | |
| V | 1459 | |
| I | 1034 | |
| H | 425 | 2.6% |
| C | 386 | 2.4% |
| Other values (4) | 375 | 2.3% |
Class
Categorical
| Distinct | 4 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 29.6 KiB |
| negative | |
|---|---|
| compensated_hypothyroid | 194 |
| primary_hypothyroid | 95 |
| secondary_hypothyroid | 2 |
Length
| Max length | 23 |
|---|---|
| Median length | 8 |
| Mean length | 9.0554083 |
| Min length | 8 |
Characters and Unicode
| Total characters | 34157 |
|---|---|
| Distinct characters | 17 |
| Distinct categories | 2 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | negative |
|---|---|
| 2nd row | negative |
| 3rd row | negative |
| 4th row | negative |
| 5th row | negative |
Common Values
| Value | Count | Frequency (%) |
| negative | 3481 | |
| compensated_hypothyroid | 194 | 5.1% |
| primary_hypothyroid | 95 | 2.5% |
| secondary_hypothyroid | 2 | 0.1% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| negative | 3481 | |
| compensated_hypothyroid | 194 | 5.1% |
| primary_hypothyroid | 95 | 2.5% |
| secondary_hypothyroid | 2 | 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 7352 | |
| t | 3966 | |
| i | 3867 | |
| a | 3772 | |
| n | 3677 | |
| g | 3481 | |
| v | 3481 | |
| o | 778 | 2.3% |
| y | 679 | 2.0% |
| h | 582 | 1.7% |
| Other values (7) | 2522 | 7.4% |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 33866 | |
| Connector Punctuation | 291 | 0.9% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 7352 | |
| t | 3966 | |
| i | 3867 | |
| a | 3772 | |
| n | 3677 | |
| g | 3481 | |
| v | 3481 | |
| o | 778 | 2.3% |
| y | 679 | 2.0% |
| h | 582 | 1.7% |
| Other values (6) | 2231 | 6.6% |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 291 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 33866 | |
| Common | 291 | 0.9% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 7352 | |
| t | 3966 | |
| i | 3867 | |
| a | 3772 | |
| n | 3677 | |
| g | 3481 | |
| v | 3481 | |
| o | 778 | 2.3% |
| y | 679 | 2.0% |
| h | 582 | 1.7% |
| Other values (6) | 2231 | 6.6% |
Common
| Value | Count | Frequency (%) |
| _ | 291 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 34157 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 7352 | |
| t | 3966 | |
| i | 3867 | |
| a | 3772 | |
| n | 3677 | |
| g | 3481 | |
| v | 3481 | |
| o | 778 | 2.3% |
| y | 679 | 2.0% |
| h | 582 | 1.7% |
| Other values (7) | 2522 | 7.4% |
Auto
The auto setting is an interpretable pairwise column metric of the following mapping:- Variable_type-Variable_type : Method, Range
- Categorical-Categorical : Cramer's V, [0,1]
- Numerical-Categorical : Cramer's V, [0,1] (using a discretized numerical column)
- Numerical-Numerical : Spearman's ρ, [-1,1]
This configuration uses the recommended metric for each pair of columns.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.| age | sex | on_thyroxine | query_on_thyroxine | on_antithyroid_medication | sick | pregnant | thyroid_surgery | I131_treatment | query_hypothyroid | query_hyperthyroid | lithium | goitre | tumor | hypopituitary | psych | TSH_measured | TSH | T3_measured | T3 | TT4_measured | TT4 | T4U_measured | T4U | FTI_measured | FTI | TBG_measured | TBG | referral_source | Class | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 41 | F | f | f | f | f | f | f | f | f | f | f | f | f | f | f | t | 1.3 | t | 2.5 | t | 125 | t | 1.14 | t | 109 | f | NaN | SVHC | negative |
| 1 | 23 | F | f | f | f | f | f | f | f | f | f | f | f | f | f | f | t | 4.1 | t | 2 | t | 102 | f | NaN | f | NaN | f | NaN | other | negative |
| 2 | 46 | M | f | f | f | f | f | f | f | f | f | f | f | f | f | f | t | 0.98 | f | NaN | t | 109 | t | 0.91 | t | 120 | f | NaN | other | negative |
| 3 | 70 | F | t | f | f | f | f | f | f | f | f | f | f | f | f | f | t | 0.16 | t | 1.9 | t | 175 | f | NaN | f | NaN | f | NaN | other | negative |
| 4 | 70 | F | f | f | f | f | f | f | f | f | f | f | f | f | f | f | t | 0.72 | t | 1.2 | t | 61 | t | 0.87 | t | 70 | f | NaN | SVI | negative |
| 5 | 18 | F | t | f | f | f | f | f | f | f | f | f | f | f | f | f | t | 0.03 | f | NaN | t | 183 | t | 1.3 | t | 141 | f | NaN | other | negative |
| 6 | 59 | F | f | f | f | f | f | f | f | f | f | f | f | f | f | f | f | NaN | f | NaN | t | 72 | t | 0.92 | t | 78 | f | NaN | other | negative |
| 7 | 80 | F | f | f | f | f | f | f | f | f | f | f | f | f | f | f | t | 2.2 | t | 0.6 | t | 80 | t | 0.7 | t | 115 | f | NaN | SVI | negative |
| 8 | 66 | F | f | f | f | f | f | f | f | f | f | f | f | t | f | f | t | 0.6 | t | 2.2 | t | 123 | t | 0.93 | t | 132 | f | NaN | SVI | negative |
| 9 | 68 | M | f | f | f | f | f | f | f | f | f | f | f | f | f | f | t | 2.4 | t | 1.6 | t | 83 | t | 0.89 | t | 93 | f | NaN | SVI | negative |
| age | sex | on_thyroxine | query_on_thyroxine | on_antithyroid_medication | sick | pregnant | thyroid_surgery | I131_treatment | query_hypothyroid | query_hyperthyroid | lithium | goitre | tumor | hypopituitary | psych | TSH_measured | TSH | T3_measured | T3 | TT4_measured | TT4 | T4U_measured | T4U | FTI_measured | FTI | TBG_measured | TBG | referral_source | Class | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3762 | 61 | M | f | f | f | f | f | f | f | f | f | f | f | f | f | f | f | NaN | t | 1 | t | 72 | t | 0.7 | t | 103 | f | NaN | other | negative |
| 3763 | 41 | F | f | f | f | f | f | f | f | f | f | f | f | f | f | f | f | NaN | f | NaN | f | NaN | f | NaN | f | NaN | f | NaN | other | negative |
| 3764 | 54 | M | f | f | f | f | f | f | f | f | f | f | f | f | f | f | t | 1.2 | t | 1.9 | t | 89 | t | 0.85 | t | 104 | f | NaN | SVI | negative |
| 3765 | 73 | F | t | f | f | f | f | f | f | f | f | f | f | f | f | f | t | 8.5 | t | 2.1 | t | 104 | t | 1.13 | t | 92 | f | NaN | SVI | negative |
| 3766 | 19 | F | f | f | f | f | f | f | f | f | f | f | f | f | f | f | t | 8.8 | t | 2.7 | t | 108 | t | 1.11 | t | 97 | f | NaN | other | compensated_hypothyroid |
| 3767 | 30 | F | f | f | f | f | f | f | f | f | f | f | f | t | f | f | f | NaN | f | NaN | f | NaN | f | NaN | f | NaN | f | NaN | other | negative |
| 3768 | 68 | F | f | f | f | f | f | f | f | f | f | f | f | f | f | f | t | 1 | t | 2.1 | t | 124 | t | 1.08 | t | 114 | f | NaN | SVI | negative |
| 3769 | 74 | F | f | f | f | f | f | f | f | f | t | f | f | f | f | f | t | 5.1 | t | 1.8 | t | 112 | t | 1.07 | t | 105 | f | NaN | other | negative |
| 3770 | 72 | M | f | f | f | f | f | f | f | f | f | f | f | f | f | f | t | 0.7 | t | 2 | t | 82 | t | 0.94 | t | 87 | f | NaN | SVI | negative |
| 3771 | 64 | F | f | f | f | f | f | f | f | f | f | f | f | f | f | f | t | 1 | t | 2.2 | t | 99 | t | 1.07 | t | 92 | f | NaN | other | negative |
Most frequently occurring
| age | sex | on_thyroxine | query_on_thyroxine | on_antithyroid_medication | sick | pregnant | thyroid_surgery | I131_treatment | query_hypothyroid | query_hyperthyroid | lithium | goitre | tumor | hypopituitary | psych | TSH_measured | TSH | T3_measured | T3 | TT4_measured | TT4 | T4U_measured | T4U | FTI_measured | FTI | TBG_measured | referral_source | Class | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 4 | 26 | F | f | f | f | f | f | f | f | f | f | f | f | f | f | f | f | NaN | f | NaN | f | NaN | f | NaN | f | NaN | f | other | negative | 6 |
| 5 | 29 | F | f | f | f | f | f | f | f | f | f | f | f | f | f | f | f | NaN | f | NaN | f | NaN | f | NaN | f | NaN | f | other | negative | 4 |
| 7 | 32 | F | f | f | f | f | f | f | f | f | f | f | f | f | f | f | f | NaN | f | NaN | f | NaN | f | NaN | f | NaN | f | other | negative | 4 |
| 8 | 33 | F | f | f | f | f | f | f | f | f | f | f | f | f | f | f | f | NaN | f | NaN | f | NaN | f | NaN | f | NaN | f | other | negative | 4 |
| 15 | 41 | F | f | f | f | f | f | f | f | f | f | f | f | f | f | f | f | NaN | f | NaN | f | NaN | f | NaN | f | NaN | f | other | negative | 4 |
| 17 | 51 | F | f | f | f | f | f | f | f | f | f | f | f | f | f | f | f | NaN | f | NaN | f | NaN | f | NaN | f | NaN | f | other | negative | 4 |
| 21 | 57 | F | f | f | f | f | f | f | f | f | f | f | f | f | f | f | f | NaN | f | NaN | f | NaN | f | NaN | f | NaN | f | other | negative | 4 |
| 24 | 58 | F | f | f | f | f | f | f | f | f | f | f | f | f | f | f | f | NaN | f | NaN | f | NaN | f | NaN | f | NaN | f | other | negative | 4 |
| 1 | 19 | F | f | f | f | f | f | f | f | f | f | f | f | f | f | f | f | NaN | f | NaN | f | NaN | f | NaN | f | NaN | f | other | negative | 3 |
| 3 | 22 | F | f | f | f | f | f | f | f | f | f | f | f | f | f | f | f | NaN | f | NaN | f | NaN | f | NaN | f | NaN | f | other | negative | 3 |